Skip to content

Conversation

@geoffreyclaude
Copy link
Contributor

@geoffreyclaude geoffreyclaude commented Sep 30, 2025

Which issue does this PR close?

Rationale for this change

DataFusion currently lacks a clean way to customize how SQL table factors (FROM clause elements) are planned into logical plans. The proposed workaround in #17633 has a critical limitation: it only works at the query root level and cannot handle custom relations inside JOINs, CTEs, or subqueries.

This PR introduces a RelationPlanner extension API that allows users to intercept and customize table factor planning at any nesting level, enabling support for SQL syntax extensions that go beyond simple table-valued functions. For example, you can now combine multiple custom relation types in a single query:

let ctx = SessionContext::new();

// Register custom planners for SQL syntax extensions
ctx.register_relation_planner(Arc::new(TableSamplePlanner))?;
ctx.register_relation_planner(Arc::new(MatchRecognizePlanner))?;
ctx.register_relation_planner(Arc::new(PivotUnpivotPlanner))?;

// Use multiple custom table modifiers together - even in nested JOINs or CTEs
let df = ctx.sql(r#"
    WITH sampled_data AS (
        SELECT * FROM stock_prices 
        TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)
    )
    SELECT symbol, quarter, price
    FROM sampled_data
    MATCH_RECOGNIZE (
        PARTITION BY symbol ORDER BY time
        MEASURES LAST(price) AS price, quarter
        PATTERN (UP+ DOWN+)
        DEFINE 
            UP AS price > PREV(price), 
            DOWN AS price < PREV(price)
    ) AS patterns
    PIVOT (
        AVG(price) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4')
    ) AS pivoted
"#).await?;

df.show().await?;

Why not use TableFunctionImpl? The existing TableFunctionImpl trait is perfect for simple table-valued functions (like generate_series(1, 10)), but it cannot handle:

  • SQL clause modifiers (e.g., TABLESAMPLE that modifies an existing table reference)
  • New table factor syntaxes (e.g., MATCH_RECOGNIZE, PIVOT, UNPIVOT)
  • Complex syntax that doesn't follow the function call pattern

RelationPlanner fills this gap by intercepting arbitrary TableFactor AST nodes and transforming them into logical plans.

What changes are included in this PR?

Core API (feat commit):

  • New RelationPlanner trait for customizing SQL table factor planning
  • RelationPlannerContext trait providing SQL utilities to extension planners
  • SessionContext::register_relation_planner() for registering custom planners
  • SessionState integration with priority-based planner chain
  • Integration into SqlToRel to invoke planners at all nesting levels (not just root)

Tests (test commit):

  • Comprehensive tests in datafusion/core/tests/user_defined/relation_planner.rs
  • Coverage for basic planner registration, priority ordering, and nested relations
  • Tests demonstrating custom table factors and syntax extensions

Examples (example commit):

Note: The examples are intentionally verbose to demonstrate the full design and capabilities of the API. They should be simplified and streamlined before merging (reduce duplication, extract common patterns, improve documentation structure).

Are these changes tested?

Yes:

  • Unit tests for planner registration, priority ordering, and chaining
  • Integration tests demonstrating nested relation handling (JOINs, CTEs, subqueries)
  • Example programs serve as additional end-to-end tests
  • All examples include multiple test cases showing different usage patterns
  • Examples demonstrate syntax that cannot be implemented with TableFunctionImpl

Are there any user-facing changes?

Yes, this is a new public API:

New APIs:

  • datafusion_expr::planner::RelationPlanner trait
  • datafusion_expr::planner::RelationPlannerContext trait
  • datafusion_expr::planner::PlannedRelation struct
  • datafusion_expr::planner::RelationPlanning enum
  • SessionContext::register_relation_planner()
  • SessionState::register_relation_planner() and relation_planners()
  • SessionStateBuilder::with_relation_planners()
  • ContextProvider::get_relation_planners()

This is an additive change that extends existing extensibility APIs (ExprPlanner, TypePlanner) and requires the sql feature flag.

AI-Generated Code Disclosure

This PR was developed with significant assistance from claude-sonnet-4.5. The AI was heavily involved in all parts of the process, from initial design to actual code to writing the PR description, which greatly sped up the process. All its output was however carefully reviewed before submitting.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate labels Sep 30, 2025
@geoffreyclaude geoffreyclaude force-pushed the feat/custom_relation_planner branch 2 times, most recently from df4f347 to 6865772 Compare September 30, 2025 15:06
@geoffreyclaude geoffreyclaude force-pushed the feat/custom_relation_planner branch 5 times, most recently from 65fde95 to 952f955 Compare October 1, 2025 11:57
@theirix
Copy link
Contributor

theirix commented Oct 1, 2025

I just had a first glance – the RelationPlanner is an excellent idea! It allows extending the SqlToRel implementation in an external module. The user code looks the same with a well-known let df = ctx.sql(sql).await? statement. And extending the SQL API is shifted to configuring the context. I think it aligns well with the existing extensibility API.

@geoffreyclaude geoffreyclaude force-pushed the feat/custom_relation_planner branch 4 times, most recently from 952f955 to 25d9ce4 Compare October 13, 2025 08:23
@geoffreyclaude geoffreyclaude force-pushed the feat/custom_relation_planner branch from 25d9ce4 to 878fe3d Compare October 15, 2025 08:13
@geoffreyclaude geoffreyclaude marked this pull request as ready for review October 15, 2025 14:54
@geoffreyclaude geoffreyclaude force-pushed the feat/custom_relation_planner branch from 878fe3d to 3947401 Compare October 16, 2025 13:21
@geoffreyclaude geoffreyclaude force-pushed the feat/custom_relation_planner branch from 3947401 to c1be1e1 Compare October 16, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Relation Planner Extension API

2 participants